Method, computer readable medium and system for tomographic reconstruction

ABSTRACT

The invention is a method for emission tomographic reconstruction from measurement data, the method comprising the steps of: obtaining the measurement data corresponding to activity of particle emissions of a volume from a tomographic imaging system, and performing at least one iteration step to obtain emission density data of particle emissions of the volume from the measurement data, wherein the at least one iteration step comprises a forward projection being a projection from the emission density data to the measurement data, and a back projection being a projection from the measurement data to the emission density data, and the at least one iteration step is carried out in a parallel hardware architecture, wherein both the forward projection and the back projection are of at least partially gathering type, and the measurement data is of binned type. The invention also relates to computer readable medium and a system for processing tomographic reconstruction from measurement data.

TECHNICAL FIELD

The invention relates to a method, a computer readable medium and asystem for tomographic reconstruction from measurement data.

BACKGROUND ART

In emission tomography the density data of particle emissions (in thefollowing: emission density data) in a volume has to be found on thebasis of measurement data. Emitting particles are for instance positronsor gamma photons. Emitted particles may get scattered or get absorbed,and may also change their type in the volume or before detection.Emitted particles or the particles originating from the emittedparticles arrive in detector devices that report detection events. Theabove mentioned physical phenomena complicate the tomographicreconstruction, and for an appropriate result these have to be takeninto account in the course of the reconstruction. An emissiontomographic reconstruction method reconstructs the emission density databased on the measurement data obtained from signals of detector devicescomprising a plurality of detector elements.

In Positron Emission Tomography (PET) the density of radioactive tracermaterials has to be found, which is proportional with its emissiondensity data. In the case of PET, the particles emitted from the tracermaterial are positrons, which are replaced by pairs of approximatelyoppositely directed gamma photons when they annihilate with electrons(cf. Physics reference manual, Geant4 9.1. Technical Report, CERN,2007). These gamma photons may get scattered in the body and/or in thedetector elements before they get finally absorbed in a detector elementcomprised in the detector device or module. The detector elements areusually in the form of detector crystals. In PET the detector deviceshould detect pairs of gamma photons arriving almost simultaneously, soan event occurs if two detector elements nearly simultaneously detecttwo photons. This means that in PET, a pair of detector elements,sometimes called detector crystals, represents a conceptual detector,which is also called Line Of Response (LOR).

In Single Photon Emission Tomography (SPECT), emitted particles aregamma photons. A particle may also get scattered or absorbed in theobject, in the collimator or in the detector crystals comprised in adetector device rotating around the object to be investigated. Adetector crystal—i.e. a detector element—is able to detect gamma photonsthat went through a collimator associated to it. The collimator can befor example parallel hole, pinhole, multi-pinhole, fan or cone beam,slit-slat type or any other type. Thus in SPECT, a conceptual detectoris defined by the actual orientation of the detector device, i.e. of itsdetector elements and the corresponding collimators. The linecharacterizing this orientation is called the projection line.

Two main types of the reconstruction methods are known. If the emissiondensity is reconstructed only from the number of events per conceptualdetector, then the reconstruction process is said to be based on binneddata. In a binned type reconstruction the inputs comprise the totalnumber of hits in conceptual detectors acquired during the measurement.Consequently, when binned reconstruction is applied, individual detectorevents undergo binning and the emission density data has to bereconstructed from the spatial histogram of photon pair hits in the caseof PET or from the photon hits in the case of SPECT.

However, if the reconstruction method considers events one-by-one intheir order, then the reconstruction is of a list mode. A list modereconstruction may involve more information, like the timing ofindividual events. The differences between binned and list modereconstruction are given in more detail below.

The required output of a reconstruction method of any type is theemission density function x({right arrow over (v)}) in points {rightarrow over (v)} of a volume of interest □, which is typicallydiscretized on a voxel grid. In the case of binned reconstruction theinput of the reconstruction method is the measurement data, the measurednumber of hits in the conceptual detectors, which is connected to theoutput by a system matrix. An element of the system matrix representsthe probability that a particle generated in a voxel is detected by aconceptual detector.

For typical systems, both the number of conceptual detectors and thenumber of voxels may be in the range of several hundred millions, thusthe system matrix has a very large size and the reconstruction methodmust scale up well and must be appropriate for high performancecomputation platforms. The known time-consuming processes of iterativetomography reconstruction are targeted by pre-computing system dependentinformation and always needed more efficient computational platforms (A.J. Reader and H. Zaidi: Advances in PET Image Reconstruction, PETClinics, Vol. 2, pp. 173-190 (2007)).

Using a pre-computed (U.S. Pat. No. 7,332,721 B2, S. Moehrs, M. Defrise,N. Belcari, A Del Guerra, A. Bartoli, S. Fabbri, and G. Zanetti:Multi-ray-based system matrix generation for 3D PET reconstruction,Phys. Med. Biol., Vol. 53, pp. 6925-6945 (2008) and J. Qi, R. M. Leahy,S. R. Cherry, A. Chatziioannou, and T. H. Farquhar: High-resolution 3DBayesian image reconstruction using the microPET small-animal scanner.Physics in Medicine and Biology, Vol. 43, pp. 1001-1013 (1998)) or ameasured (V. Y. Panin, F. Kehren, C. Michel, and M. Casey: Fully 3-D PETreconstruction with system matrix derived from point sourcemeasurements, IEEE Transactions on Medical Imaging, Vol. 25, pp. 907-921(2006)) system matrix seems to be an attractive option, but in highresolution real three-dimensional scanners this would pose prohibitivememory requirements disadvantageously, even if the matrix is compressed,decomposed as a product of a series of smaller matrices (J. Qi, R. M.Leahy, S. R. Cherry. A. Chatziioannou, and T. H. Farquhar:High-resolution 3D Bayesian image reconstruction using the microPETsmall-animal scanner, Physics in Medicine and Biology, Vol. 43, pp.1001-1013 (1998)), or symmetry is exploited (J. L. Herraiz. S. Espaa. S.Garcia, R. Cabido, A. S. Montemayor, M. Desco, J. J. Vaquero, and J. M.Udias: GPU acceleration of a fully 3D iterative reconstruction softwarefor PET using CUDA, in: Nuclear Science Symposium Conference Record(NSS/MIC), 2009 IEEE, pp. 4064-4067 (2009)). On the other hand, in caseof a pre-computed or measured system matrix, phenomena like object andmaterial dependent positron range, absorption or scattering cannot beconsidered, disadvantageously.

Among the high-performance computing possibilities, likeField-Programmable Gate Arrays (FPGA) (M. Leeser, S. Coric, E. Miller,H. Yu, and M. Trepanier: Parallel-beam back projection: an FPGAimplementation optimized for medical imaging, in: Proc. Tenth Int.Symposium on FPGA, pp. 217-226 (2002) and US 2008/0095300 A1), multi-CPUsystems (D. W. Shattuck, J. Rapela, E. Asma, A. Chatzioannou, J. Qi, andR. M. Leahy: Internet2-based 3D PET image reconstruction using a PCcluster, Physics in Medicine and Biology, Vol. 47, pp. 2785-2795(2002)), cell processors (M. Kacheiries, M. Knaup, and O. Bockenbach:Hyperfast parallel-beam and conebeam back projection using the CELLgeneral purpose hardware, Medical Physics, Vol. 34. pp. 1474-1486(2007)), and GPUs (Graphics Processing Units) (F. Xu and K. Mueller:Real-time 3D computed tomographic reconstruction using commoditygraphics hardware, Physics in Medicine and Biology, pp. 3405-3419(2007)), in a comparison to other architectures, the massively parallelGPU has proven to be the most cost-effective platform for tomographicreconstruction tasks (N. Gac, S. Mancini, M. Desvignes, and D. Houzet:High speed 3D tomography on CPU, GPU, and FPGA, EURASIP Journal onEmbedded Systems, Article ID 930250 (2008)).

GPUs can be programmed with two different programming models. ShaderAPIs (Application Programming Interface) present the GPU hardware as thedirect implementation of the incremental rendering pipeline (L.Szirmay-Kalos, L. Szcsi. M. Sbert: GPU-Based Techniques for GlobalIllumination Effects, Morgan and Claypool Publishers, San Rafael, USA(2008)), including both programmable and fixed processing stages. On theother hand, GPGPU (General Purpose GPU) APIs like CUDA (Compute UnifiedDevice Architecture) (NVIDIA, http://developer.nvidia.com/cuda, in: TheCUDA Homepage) and OpenCL (Open Computing Language) provide access tothe multiprocessors of the GPU where each multiprocessor contains a setof scalar processors sharing the instruction unit, and therefore actingas a SIMD (Single Instruction. Multiple Data) hardware. If only thegeometric projection is considered in a tomographic reconstruction, thenthe shader API implementation is more appropriate since rasterizer andalpha-blending units accessible only through the shader API supportthese simple calculations (B. Bai, A. Smith: Fast 3D iterativereconstruction of PET images using PC graphics hardware, in: IEEENuclear Science Symposium, pp. 2787-2790 (2006); F. Xu and K. Mueller:Real-time 3D computed tomographic reconstruction using commoditygraphics hardware, Physics in Medicine and Biology, pp. 3405-3419(2007); G. Pratx, C. Chinn, P. D. Olcott, C. S. Levin: Fast, accurateand shift-varying line projections for iterative reconstruction usingthe GPU, IEEE Trans. on Medical Imaging, Vol. 28, pp. 435-445 (2009)). Atomographic imaging technique for GPU is described in F. Xu and K.Mueller: GPU-Acceleration of Attenuation and Scattering Compensation inEmission Computed Tomography, in: 9th Fully Three-Dimensional ImageReconstruction in Radiology and Nuclear Medicine '07 (2007). However,when an algorithm gets more complex, the incremental rendering pipelineview of the shader API becomes too restrictive and less intuitive (G.Pratx, L. Xing: GPU computing in medical physics: A review, Med. Phys.,Vol. 38, pp. 2685-2698 (2011)), and GPGPU APIs become the better choicefor such tasks. The critical issue of the GPGPU programming is threadmapping, i.e. the decomposition of the algorithm to parallel threadsthat can run efficiently. For example, while simulating particletransport, it is intuitive to mimic how nature works in parallel, andassign parallel computational threads, for example, to randomlygenerated photon paths (A. Wirth, A. Cserkaszky, B. Kári, D. Légrády, S.Fehér, S. Czifrus, B. Domonkos: Implementation of 3D Monte Carlo PETreconstruction algorithm on GPU, in: Nuclear Science SymposiumConference Record (NSS/MIC), IEEE, pp. 4106-4109 (2009)). However, whilesignificant speed ups can be obtained with respect to a CPUimplementation, this somehow natural thread mapping cannot exploit thefull potential of GPUs.

In an iterative tomographic reconstruction, forward projection computingthe expected detector hits from an actual estimation on emission densitydata and back projection correcting the emission density data based onthe measured and expected detector responses, alternate. Equations offorward projection and back projection are similar in the way that theytake many input values—voxel intensities and data of conceptualdetectors, respectively—and compute many unknown values—again, data ofconceptual detectors and voxel intensities, respectively. This kind of“many to many” computation can be organized in two different ways. Knownvalues can be taken one-by-one obtaining the contribution of a singleknown value to all of the unknowns, and accumulate the contributions asthe different known values are visited. This scheme is calledscattering. The opposite—sometimes called orthogonal—approach takesunknown values one-by-one, and obtains the contribution of all knownvalues to this particular unknown value. This approach is calledgathering. Generally, input-driven algorithms are of scattering type,while output-driven ones are of gathering type.

A list mode PET reconstruction using GPU architecture is described in US2011/0182491 A1 and further detailed in a paper (J. Cui, G. Pratx, S.Prevrhal, C. S. Levin: Fully 3D list-mode time-of-flight PET imagereconstruction on GPUs using CUDA, Med. Phys., Vol. 38, No. 12 (2011)).These documents state that for a reconstruction using a GPU it is worthif both the forward projection and the back projection is of gatheringtype. However, the back projection detailed in the documents is ofscattering type.

In U.S. Pat. No. 7,778,392 B1 such a computer tomography reconstructionmethod is described, where the back projection is optimized for GPUarchitecture. In U.S. Pat. No. 7,876,944 B2 a medical imagereconstruction method is described, where a GPU-optimized backprojection step is applied.

In U.S. Pat. No. 6,804,325 B1 such a reconstruction method is described,where the end points of the LORs are randomly distributed on the surfaceof a detector element. In US 2007/0201611 A1 such a PET reconstructionmethod for a GPU is described, in which tri-linear interpolation isused. In U.S. Pat. No. 7,333,107 B2 a gather-scatter approach isdetailed in connection with the forward- and back projection, and thereuse of an already computed data is described for a GPU. In U.S. Pat.No. 7,120,283 B2, US 2010/0266178 A1, U.S. Pat. No. 7,873,236 B2 and US2011/0051893 A1 medical image processing methods are described for aGPU. In US 2010/0128046 A1 Poisson-disk distribution is used forsampling in a GPU architecture.

Unmatching forward and back projection is described for tomographicreconstruction in G. Zeng, G. Gullberg: Unmatchedprojector/backprojector pairs in an iterative reconstruction algorithm,IEEE Transactions on Medical Imaging, Vol. 19, pp. 548-555 (2000); R.Guedouar, B. Zarrad: A comparative study between matched and mis-matchedprojection/back projection pairs used with ASIRT reconstruction method,Nuclear Instruments and Methods in Physics Research, Vol. 619, pp.225-229 (2010); V.-G. Nguyen, S.-J. Lee, M N. Lee: GPU-accelerated 3DBayesian image reconstruction from Compton scattered data, Physics inMedicine and Biology, Vol. 56. pp. 2817 (2011); and N.-Y. Lee, Y. Choi:Theoretical investigation on an unmatched backprojector for iterativereconstruction in emission computed tomography, Journal of the KoreanPhysical Society, Vol. 59, pp. 367-375 (2011).

In US 2008/0095300 A1, U.S. Pat. No. 7,769,217 B2 and U.S. Pat. No.8,000,513 B2 it is described that the forward projection and the backprojection are advantageously matching algorithms. It is described inU.S. Pat. No. 7,856,129 B2 that the method detailed therein can begeneralized to unmatching projections. In U.S. Pat. No. 5,414,623 andU.S. Pat. No. 7,596,202 B2 such methods are described, where a filteringis applied between forward and back projection.

In US 2011/0164799 A1 and US 2011/0194747 A1 such methods are describedwhere filtering is applied for the reconstruction data in the Fourierspace.

In U.S. Pat. No. 7,381,960 B1, U.S. Pat. No. 7,759,647 B2 and U.S. Pat.No. 7,835,782 B2 the problem of positron propagation before annihilationis detailed. In U.S. Pat. No. 7,835,782 B2 a scheme for handling thisproblem is described. In U.S. Pat. No. 7,923,690 B2 and US 2011/0248765A1 it is described that the problem of the positron propagation is nothandled.

Tomographic systems are described in U.S. Pat. No. 6,373,059 B1, U.S.Pat. No. 7,211,799 B2, US 2010/0108896 A1 and US 2011/0127434 A1, inwhich the different sensitivities of the detector elements on thedetection of an event are taken into account. In US 2007/0221855 A1 andWO 2011/036436 A1 the photon transport through adjacent detectorelements is handled.

In U.S. Pat. No. 7,015,477 B2, U.S. Pat. No. 7,332,721 B2, U.S. Pat. No.7,876,941 B2, US 2011/0044546 A1, US 2011/0164031 A1 and U.S. Pat. No.7,983,465 B2 the possible solutions of the equations of forward- and/orback projection are described.

A tomographic reconstruction method is disclosed in M. Magdics, L.Szirmay-Kalos, Á. Szlavecz, G. Hesz, B. Benyó, A. Cserkaszky, J. Lantos,D. Légrády, Sz. Czifrus, A. Wirth, B. Kári, G. Patay, D. Vögyes, T.Bükki, P. Major, G. Németh, B. Domonkos: TeraTomo project: a fully 3DGPU based reconstruction code for exploiting the imaging capability ofthe NanoPET/CT system, 2010 World Molecular Imaging Congress, 11 Sep.2010. This paper discusses direct Monte Carlo particle transport andpresents an adjoint Monte Carlo method which applies gathering typeapproach both in forward and back projection. Both direct photon hitsand scattered photon hits are considered.

Several methods for handling gamma photon scattering in the detectormodules are disclosed in M. Magdics, B. Tóth, L. Szécsi, B. Csébfalvi,L. Szirmay-Kalos, Á. Szlavecz, G. Hesz, B. Benyó, Á. Cserkaszky, D.Légrády. Sz. Czifrus, A. Wirth, B. Kári, J. Lantos, G. Patay, D.Völgyes, P. Major. G. Németh, T. Bükki, B. Domonkos: Detector ModellingTechniques for Pre-Clinical 3D PET Reconstruction on the GPU,Proceedings of the 11th International Meeting on Fully 3D ImageReconstruction in Radiology and Nuclear Medicine, 11 Jul. 2011, pp.375-378.

A photon transport simulation method, that executes the very sameinstruction in parallel threads at a time and eliminates the conditionalinstructions, is disclosed in L. Szirmay-Kalos, B. Tóth, M. Magdics, D.Légrády, and A. Penzov: Gamma Photon Transport on the GPU for PET, in:Large-scale scientific computing, Springer Berlin, Heidelberg, pp.435-442, 4 Jun. 2009.

It is a disadvantage of many of the prior art solutions that the levelof optimization of the reconstruction methods used for the parallelarchitectures is not high enough.

DESCRIPTION OF THE INVENTION

In view of the known solutions there is a need for providing atomographic reconstruction method which can achieve as high accuracy andlow reconstruction time as possible, i.e. near real-time reconstruction.

Therefore, an object of the invention is to provide an emissiontomographic reconstruction method, which is free of the disadvantages ofprior art solutions.

It is another object of the invention to provide an emission tomographicreconstruction method which can achieve as high accuracy and lowreconstruction time as possible, i.e. which maximizes the accuracy ofthe system matrix estimation with a relatively small number of samples.

A further object of the invention is to provide an emission tomographicreconstruction method which is highly optimized for parallelarchitectures.

The object of an embodiment of the invention is to get an algorithm thatis accurate for all types of data, i.e. applicable either forhomogeneous regions or small, high intensity regions to be reconstructedand is fast on the target parallel architecture.

The objects of the invention can be achieved by the emission tomographicreconstruction method according to claim 1, by the computer readablemedium according to claim 14 and the system according to claim 15.Preferred embodiments are defined in the dependent claims.

The main difference between the prior art solutions and the methodaccording to the invention is that the prior art solutions which aremore or less optimized for parallel architectures, described e.g. in US2011/0182491 A1, utilize list mode reconstruction unlike the inventiontargeting binned mode reconstruction.

List mode reconstruction processes a list of individual events, thusevent specific information can be assigned to each detector hit, likephoton energy or the time difference of detections (time of flight).Binned mode reconstruction, on the other hand, works with accumulateddata, which comprise the number of hits in each conceptual detector and,optionally in the case of time of flight measurements, associatedhistograms of energy levels of time of flight measurements. A binnedmode reconstruction, therefore, in spite of dropping out the additionalinformation handled in a list mode reconstruction, can be appropriatefor fast reconstructions. In the binned mode reconstruction methodaccording to the invention, on the top of the fast reconstruction, highaccuracy can be reached by careful implementation as detailed below.

As list mode processes events one-by-one, it cannot reuse computationdone previously when another event in the same conceptual detector wasprocessed. Unlike the list mode reconstruction, the reuse of previouslydone computations can be applied in binned mode reconstruction. To makethis reuse even more effective, the invention proposes the applicationof factoring in some embodiments, which improves the performance ofbinned mode reconstruction and would not be feasible in a list modeapproach.

Finally, as binned mode reconstruction uses all data in an accumulatedform, the complete reconstruction problem can be considered both fromthe point of view of conceptual detectors and from the voxels. Thus,there is the option of developing both conceptual detector driven andvoxel driven algorithms in forward and back projection, respectively,where the signal of the detector elements and the emission density datacorresponding to voxel values are the respective outputs. This iscrucial when the algorithm is designed for parallel implementation wheregenerally output-driven algorithms should be preferred. However, in adata element of list mode, a conceptual detector and several voxels aspossible sources are involved. Even if events of the list are grouped,there are still just a subset of conceptual detectors and possiblesource voxels that could cause these events compared to the binned modereconstruction. As the possible source voxels are determined by theconceptual detectors of the subset, the processing algorithm mustnecessarily be conceptual detector driven. A voxel driven approach isnot feasible in the case of the list mode reconstruction since it wouldconsider all conceptual detectors, so the remaining advantages of listmode processing would be lost. This is exactly why both forwardprojection and back projection are LOR-driven in US 2011/0182491 A1.

A further difference between the prior art solutions and someembodiments of the method according to the invention is that the priorart solutions involve either gathering type approaches, which are goodfor GPU implementation but cannot focus on important, i.e. highcontribution subsets of the data, or scattering type approaches, whichmay reconstruct well important regions but perform poorly on the GPU. Incontrast, the invention combines the advantages of scattering andgathering approaches with the application of multiple importancesampling. In addition, unlike US 2011/0182491 A1 that addresses listmode reconstruction with gather-only forward projection and scatter-onlyback projection, some embodiments of the invention target binned modereconstruction with combined forward and combined back projections.

In the view of the above detailed reasons, the invention is based on therecognition that a binned type reconstruction method can be morethoroughly optimized on parallel architectures than a list modereconstruction method.

BRIEF DESCRIPTION OF DRAWINGS

Preferred embodiments of the invention will now be described by way ofexample with reference to drawings, in which:

FIG. 1 shows a volume of interest placed between two parallel detectordevices,

FIG. 2A shows an exemplary ray bundle between sources and detectordevices,

FIG. 2B shows the exemplary ray bundle with introducing a virtualdetector device/virtual source in between the sources and detectordevices,

FIG. 3 represents an embodiment of the inventive method in a flowchart,

FIG. 4 shows a voxel being on the line connecting two detector elements.

FIG. 5A shows an exemplary independent random distribution of points ina rectangular area,

FIG. 5B shows an exemplary Poisson-disk distribution of points in arectangular area.

FIG. 6 shows an object placed between two parallel detector devices andthe notations in connection with the identification of a LOR and adetector element in a detector device.

FIG. 7 shows a voxel placed between two parallel detector devices andlines connecting pairs of detector elements,

FIG. 8 shows a group of detector elements in which a particle ispropagating,

FIG. 9 shows a voxel placed between two parallel detector devices and aLOR giving contribution to the voxel,

FIG. 10 represents an embodiment, where no filtered sampling is applied,

FIG. 11 represents a further embodiment with filtered sampling involved,

FIG. 12A shows a Derenzo phantom,

FIG. 12B shows a homogeneity phantom,

FIGS. 13A-13C show comparison of various embodiments of the inventionfor point phantom, homogeneity phantom and Derenzo phantom,

FIGS. 14A and 14B show a reconstruction result for an animal obtained byvarious embodiments of the method according to the invention

MODES FOR CARRYING OUT THE INVENTION

The emission tomographic reconstruction method according to theinvention is carried out in a parallel architecture. In the following,such embodiments of the inventive method are detailed mainly where theparallel hardware architecture is at least one Graphics Processing Unit(GPU). The detailed embodiment of the reconstruction method according tothe invention can also be applied in other parallel architectures withslight changes within the scope of the invention. Furthermore most ofthe details of the embodiments are described in connection with PETreconstruction, but the most of the embodiments described below can beused for other tomographic reconstructions, such as SPECT.

The invention is a method for emission tomographic reconstruction frommeasurement data, the method comprising the steps of: obtaining themeasurement data corresponding to activity of particle emissions of avolume from a tomographic imaging system, and performing at least oneiteration step to obtain emission density data of particle emissions ofthe volume from the measurement data, wherein the at least one iterationstep comprises a forward projection being a projection from the emissiondensity data to the measurement data, and a back projection being aprojection from the measurement data to the emission density data, andthe at least one iteration step is carried out in a parallel hardwarearchitecture, wherein both the forward projection and the backprojection are of at least partially gathering type, and the measurementdata is of binned type. The tomographic imaging system can be preferablya positron emission tomograph or a single photon emission computedtomograph as detailed above. The details of the forward and backprojections and other details of the invention are given below.

An embodiment may be embodied in the form of computer-implementedprocesses and apparatuses for practicing those processes. An embodimentmay also be embodied in the form of a computer program code containinginstructions embodied in tangible media, such as floppy diskettes,CD-ROMs, hard drives, or any other computer readable storage medium,wherein, when the computer program code is loaded into and executed by acomputer, the computer becomes an apparatus for carrying out the method.An embodiment may also be embodied in the form of computer program code,for example, whether stored in a storage medium, loaded into and/orexecuted by a computer, or transmitted over some transmission medium,such as over electrical wiring or cabling, through fiber optics, or viaelectromagnetic radiation, wherein when the computer program code isloaded into and executed by a computer, the computer becomes anapparatus for carrying out the method. When implemented on ageneral-purpose microprocessor, the computer program code segmentsconfigure the microprocessor to create specific logic circuits.

Furthermore, the invention can take the form of a computer programproduct accessible from a computer-usable or computer-readable mediumproviding program code for use by or in connection with a computer orany instruction execution system. For the purposes of this description,a computer-usable or computer readable medium can be any apparatus thatmay include, store, communicate, propagate, or transport the program foruse by or in connection with the instruction execution system,apparatus, or device. The medium can be an electronic, magnetic,optical, electromagnetic, infrared, or semiconductor system (orapparatus or device) or a propagation medium. Therefore, the inventioncan take the form of a computer readable medium comprising a computerreadable program for processing tomographic reconstruction frommeasurement data, wherein the computer readable program when executed ona computer causes the computer to perform the steps of any embodimentsof the method according to the invention.

Embodiments can take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment including both hardwareand software elements. In a preferred embodiment, the present inventionis implemented in software, which includes but is not limited tofirmware, resident software, microcode, etc.

Embodiments can take the form of a system for processing tomographicreconstruction from measurement data, characterized by comprising anobtaining unit for obtaining the measurement data corresponding toactivity of particle emissions of a volume from a tomographic imagingsystem; and an iteration unit for performing at least one iteration stepto obtain emission density data of particle emissions of the volume fromthe measurement data, wherein the iteration unit comprises a forwardprojection unit for performing a forward projection for projecting theemission density data to the measurement data: and a back projectionunit for performing a back projection for projecting the measurementdata to the emission density data; and wherein the iteration unitcomprises a parallel hardware architecture, the forward projection andthe back projection are of at least partially gathering type, and themeasurement data is of binned type.

The reconstruction method according to the invention is implemented onGPU in some embodiments. The critical issue of the GPU programming andparallel programming in general, is thread mapping, i.e. thedecomposition of the algorithm to parallel threads of a parallelarchitecture, so that the algorithm can run efficiently.

A GPU is a collection of multiprocessors, where each multiprocessor hasseveral SIMD scalar processors that share the instruction unit and thusalways execute the same machine instruction. Thus, while the GPUexecutes many computational threads of the same program called kernel,the threads allocated to a single multiprocessor preferably have thesame control path. If two threads of the same multiprocessor executedifferent branches of a conditional instruction due to datadependencies, then the scalar processor should go through both branches,what would degrade performance. Thus, in the algorithm of thereconstruction method according to the invention the dependence of flowcontrol on input data is preferably minimized. Especially preferably,data independent flow control is performed in at least a part of thescalar processors and/or the control path of parallel threads isidentical for all possible input data.

The tomographic reconstruction method according to the invention ispreferably free of conditional instructions and the algorithm connectedto the method is preferably decomposed to phases or steps implemented bydifferent kernels. The specific kernels are started only if the datarequires execution. The iteration steps of the inventive method arecarried out in parallel threads of a parallel architecture. In at leasta part of the steps obtained by the factorization as detailed below,algorithms corresponding to the step are carried out in parallel threadsbeing of identical length. This can be achieved by eliminating variablelength loops from the program implementing a respective step, even ifthe constant length loops have more cycles than needed by the requiredaccuracy.

Synchronization of the threads and atomic operations are an availableoption on a GPU but they are expensive to use, so these are preferablyavoided in the inventive method. No atomic operation is needed ifthreads modify only their own private memory location. However,different threads may read the same memory location while the dataremains constant. Generally, global memory access is slow on the GPU,especially when atomic writes are needed to resolve thread collisions,so it is preferably avoided according to the invention.

An iterative reconstruction method comprises a forward projection and aback projection, therefore there are four different ways to implementthe method depending on whether scattering or gathering type approach isapplied in forward projection and back projection. It is emphasized thatthe distinction of these cases might be just the order of loops in a CPUimplementation, but is a crucial design decision when the algorithm runson the GPU since it defines which loop is executed in parallel on theshader processors, while the scattering type steps are not favored in aparallel architecture. GPUs and in general parallel algorithms favorgathering since it computes a single result from the available data,which can be written out without communication between the computationalthreads and the synchronization of them.

The communication between threads of different multiprocessors is donevia an external memory, which is slow. Therefore in general, externalmemory accesses are preferably minimized and all communication isavoided between threads in the method according to the invention.Running independent threads on different processors boosts performanceproportionally with the number of processors. However, as computationson different processors are independent, temporary results obtained bydifferent processors cannot be generally reused, which is obviouslypossible and worthwhile in a single thread implementation. Thus,efficiency improvement via reuse and performance enhancement viaparallelism are contradicting requirements. To reuse partial results ofother threads without prohibitive synchronization times, the algorithmshould be broken into phases, i.e. the system matrix of the inventivemethod is preferably factored—as detailed below—similarly as describedin an above mentioned paper (J. Qi, R. M. Leahy, S. R. Cherry, A.Chatziioannou, and T. H. Farquhar: High-resolution 3D Bayesian imagereconstruction using the microPET small-animal scanner, Physics inMedicine and Biology, Vol. 43, pp. 1001-1013 (1998)). In this paper thefactorization of the system matrix is done for the sake of reducingstorage and computational costs, not to allow data reuse in a parallelscheme. When threads implementing a phase are terminated, they write outthe results to the global memory. The threads of the next phase caninput these data in parallel without explicit communication, thereforedata of consecutive steps is at least partially reused.

The global memory is generally very slow compared to the register andlocal memory access and to the computational performance of processors.There is, however, one exception to this rule. If threads of the samescalar processor access neighboring data elements, then the transfer isexecuted in a single step which decreases the access time. Therefore,neighboring threads are preferably designed to access neighboring dataelements. For example, in pure Monte Carlo particle tracing threadsfetch the memory randomly, thus coherent access cannot be guaranteed.However, if particles are transferred by ray-bundles, i.e. in a step alarger set of coherent rays are processed, memory access speed can begreatly increased.

In view of the above details, the development of algorithms for thereconstruction of advanced high-resolution tomographic data ischallenging due to the enormous data storage and computationrequirements. As mentioned above, the required output of tomographicreconstruction methods is the emission density data for a binned typereconstruction. The emission density data is given e.g. by a functionx({right arrow over (v)}) in points {right arrow over (v)} of a volumeof interest □. The volume of interest □ is the volume where the positronemitting tracer material is dispersed, for example in a body to beinvestigated. The input of the reconstruction method is the measurementdata, i.e. the measured number of hits in the conceptual detectors, e.g.LORs for example in the case of PET. The measurement data is representedby a vector y=(y₁, y₂, . . . ), y comprises the number of hits per aconceptual detector, thus y has No components, where N_(D) is the numberof conceptual detectors.

In FIG. 1 a pair of detector elements 13′ and 13″ comprised inrespective detector devices 10′ and 10″, constitutes a conceptualdetector, i.e. a LOR in the case of the depicted PET system. Thedetector device 10′ and 10″ comprises other detector elements 12 aswell. Propagation line 14 of a photon pair in direction {right arrowover (w)} is illustrated in the figure. This photon pair is detected ondetector elements 13′ and 13″, thus gives a contribution to the LOR ofthe detector elements 13′ and 13″. The photon pair is emitted roughly inparallel by the annihilation of a positron. The point of emission of thephoton pair is denoted by {right arrow over (v)} in FIG. 1 and theemitted photons are propagating in the {right arrow over (w)} and−{right arrow over (w)} directions, respectively. The measurement dataof the LOR connecting detector elements 13′ and 13″ can also compriseother events. A volume of interest is also shown in FIG. 1, whichcomprises a head in the shown example.

To represent the unknown x({right arrow over (w)} ) function with finitedata, it is approximated in a finite function series form, i.e. thevolume of interest is divided into sub-volumes called voxels:

${x\left( \overset{\rightarrow}{v} \right)} = {\sum\limits_{V = 1}^{N_{voxel}}\; {x_{V}{b_{V}\left( \overset{\rightarrow}{v} \right)}}}$

where x_(v) is a V^(th) voxel value. Voxel values are elements of vectorx=(x₁, x₂, . . . ), which has N_(voxel) components. In a reconstructionmethod these voxel values are the unknown coefficients which have to befound based on measurement data. In the above expression b_(v)({rightarrow over (v)}) functions are pre-defined basis functions for eachvoxel.

The expectation value of the number of hits from all points toconceptual detector D is:

${\overset{\sim}{y}}_{D} = {{\int_{V}{{x\left( \overset{\rightarrow}{v} \right)}{T\left( \overset{\rightarrow}{v}\rightarrow D \right)}\ {v}}} = {\sum\limits_{V = 1}^{N_{voxel}}\; {A_{DV}x_{V}}}}$

where {tilde over (y)}_(D) is the expected number of hits for conceptualdetector D corresponding to a certain x({right arrow over (v)}) emissiondensity data, T({right arrow over (v)}→D) is the system sensitivity,i.e. the probability that an emission in point {right arrow over (v)} isdetected by the conceptual detector D, □ is the volume of interest, and

$A_{DV} = {\int_{V}{{b_{V}\left( \overset{\rightarrow}{v} \right)}{T\left( \overset{\rightarrow}{v}\rightarrow D \right)}\ {v}}}$

is an element of the system matrix A which connects the conceptualdetector D and voxel V. The difference between y_(D) and {tilde over(y)}_(D) should be emphasized; y_(D) is the measured number of hits forconceptual detector D, i.e. the measurement data and {tilde over(y)}_(D) is the expected number of hits for a given x({right arrow over(v)}) distribution, therefore its value changes during thereconstruction process as x({right arrow over (v)}) changes during theiteration steps.

According to the concept of maximum likelihood estimation (L. Shepp andY. Vardi: Maximum likelihood reconstruction for emission tomography,IEEE Trans. Med. Imaging, Vol. 1, pp. 113-122 (1982)), i.e. expectationmaximization (EM), a function x({right arrow over (v)}) is searched forthat maximizes the likelihood of the measured data assuming thatmeasured values are independently and randomly distributed and followPoisson distribution. This optimization alternates forward projection

${\overset{\sim}{y}}_{D} = {\sum\limits_{V = 1}^{N_{voxel}}\; {A_{DV}x_{V}^{(n)}}}$

and back projection

$x_{V}^{({n + 1})} = {x_{V}^{(n)} \cdot \frac{\sum\limits_{D = 1}^{N_{D}}\; {A_{DV}\frac{y_{D}}{{\overset{\sim}{y}}_{D}}}}{\sum\limits_{D = 1}^{N_{D}}\; A_{DV}}}$

in each of the n=1, 2, . . . iteration steps. Expectation maximizationused in preferred embodiments of the invention. As the formula of backprojection shows, in back projection the estimation of emission densitydata is corrected based on the ratio of measured hit numbers andexpected hit numbers.

FIG. 2A shows the conceptual model of emission tomography. The 3D sourceof emitted particles is divided into several voxels, e.g. voxels 16′,16″, 16′″. Emitted particles may end up in detector elements 12′, 12″,12′″ after traveling in space including possible scattering andtype-changes as demonstrated by particle paths 15. After making anassumption on the voxel values for each voxel in back projection, theexpected values of the conceptual detector hits have to be computed inthe forward projection, which require the integration of thecontribution of all possible particle paths. As the source and thedetector elements have 3D domain, and scattering can happen anywhere inthe 3D space, the contribution of sources to detector elements is ahigh-dimensional integral in the domain of source points, detectorpoints and arbitrary number of scattering points. Such high-dimensionalintegrals are preferably calculated by a Monte Carlo quadrature thatrandomly traces sample paths. The more paths are computed, the higherprecision reconstruction is obtained.

As detailed above, to have the possibility to reuse temporary results ofother threads, the particle transport process according to the forwardand back projections is preferably decomposed to phases by introducingvirtual detector devices comprising virtual detector elements. As it isdescribed below, this factorization has other advantages as well. InFIG. 2B virtual detector elements 18′, 18″, 18′″ are illustrated. By theapplication of virtual detector elements 18′, 18″, 18′″ the particletransport process is factorized. It is clear from the figure that thefactorization leads to the segmentation of particle paths 15 into pathsegments 17 and 19. In the simulation of particle propagation, theirpaths are first executed to the virtual detector elements—path segments17 in FIG. 2B, then virtual detector elements 18′, 18″, 18′″ becomevirtual sources and the next phase simulates transport from the virtualsources to the real detector elements—path segments 19.

Consequently, the idea of factoring is that the transport process. i.e.the whole propagation from the emission of a particle until it isdetected in a detector element, is decomposed to phases with theintroduction of virtual detector elements. FIG. 2B shows decompositioninto two phases, but more than two phases are also possible in atomography reconstruction method. First, the expected values in thefirst layer of virtual detector elements are computed from the source.Then, the first layer of these detector elements become sources and asimilar algorithm is executed until the real detector elements arereached. The advantages of the factorization are the following:

-   -   The calculation of a single phase can be much simpler than the        complete transport process, thus all conditional statements can        be preferably eliminated, which increases GPU efficiency.    -   As a computed sample path ended in a virtual detector element is        continued by all paths started here in the next phase, a much        higher number of sample paths will be available to compute high        dimensional integrals, thus the result is more accurate. Note        that the number of sample paths increased from 4 to 16, in the        arrangements of FIG. 2A to FIG. 2B.    -   Each phase is computed in parallel on the GPU where threads do        not communicate. However, a next phase can reuse the results of        all threads of a previous phase, so redundant computations can        be eliminated.        The only disadvantage of factoring is that virtual detector        devices discretize the continuous space into finite number of        bins, so if their number is small, a discretization error        occurs. The optimal definition of the virtual detector devices        depends on the type of the tomograph. Examples are provided in        the context of PET in the following. After factoring, the phases        are considered one-by-one and the above introduced transport        equation

${\overset{\sim}{y}}_{D} = {\int_{V}{{x\left( \overset{\rightarrow}{v} \right)}{T\left( \overset{\rightarrow}{v}\rightarrow D \right)}\ {v}}}$

should be reinterpreted in the factored case. When factorization isused, □ is the domain of source distribution considered in theinvestigated phase, {tilde over (y)}_(D) is the expected number of hitsin detector element D of the phase, which can be virtual detectorelements, and T({right arrow over (v)}→D) is the probability density oftransporting from virtual source point {right arrow over (v)}to virtualdetector element D.

When a factored phase is computed, both gathering and scattering typealgorithms can be proposed that may have significantly differentaccuracy depending on the data to be reconstructed. For example, onetype of the algorithms can be good to estimate larger homogeneousregions, while the other type to estimate small high intensity features.The gathering type and the scattering type algorithms may be associatedwith significantly different computational cost on the parallelhardware.

To summarize, it can be said that for the optimal compromise betweenthread independence and reuse of temporary results, forward projectionand back projection are preferably factored to phases or steps that canbe executed independently and without communication. The factored stepscan be based on different computation paradigms, including analyticcalculations, numerical quadrature, or exploitation of measured orsimulated data. According to the GPU requirements, all steps are atleast partly of gathering type in the reconstruction method according tothe invention. Gathering is implemented in a way that a thread computesa single output value from all relevant inputs. In the forwardprojection of a PET reconstruction method, the computation processfollows the direction from the voxels of positron density towards theexpected LOR hits. Furthermore in back projection of it, the directionis from the ratios of expected and measured LOR values to the new voxelvalues, so the factored phases are computed in the reverse order and theinput and the output of the phases are exchanged.

Multiple importance sampling is a general mathematical approach tocombine different Monte Carlo estimators (E. Veach and L. Guibas:Optimally combining sampling techniques for Monte Carlo rendering.Computer Graphics Proceedings, Annual Conference Series, 1995 (ACMSIGGRAPH '95 Proceedings), pp. 419-428 (1995)). In an embodiment of theinventive method, multiple importance sampling is proposed forapplication in phases of emission tomography to meet the goals ofaccuracy and efficient parallel implementation. These phases considerthe transport from either virtual or real sources to either virtual orreal detector devices. The application of the concept of multipleimportance sampling for the evaluation of the transport equation

${\overset{\sim}{y}}_{D} = {\int_{V}{{x\left( \overset{\rightarrow}{v} \right)}{T\left( \overset{\rightarrow}{v}\rightarrow D \right)}\ {v}}}$

is presented below when two different methods, an output-driven i.e.gathering type and an input-driven, i.e. scattering type are usedsimultaneously. The special case of using just a single—i.e. gatheringor scattering—type method is also obtained by setting the sample numbersof other methods to zero.

With the application of multiple importance sampling the reconstructionmethod in these embodiments becomes highly accurate for all types ofdata, i.e. applicable either for homogeneous regions or small, highintensity regions to be reconstructed and is fast on the target parallelarchitecture.

The details of the implementation of the multiple importance samplingare the following. Generally in the phases of the forward projection andthe back projection, a set of (virtual) sources, i.e. voxels and a setof (virtual) detector elements are considered as introduced above. In astep using multiple importance sampling N_(o) output-driven, i.e.detector oriented samples in the points {right arrow over (v)}, . . . ,{right arrow over (v)}_(N) _(o) of the source domain are taken for eachdetector element D, drawn from probability density p_(o)({right arrowover (v)}, D) which is at least approximately proportional to thetransport probability, T({right arrow over (v)}→D). The density ofsamples is d_(o)({right arrow over (v)},D)=N_(o) p_(o)({right arrow over(v)}, D), thus the Monte Carlo estimation of detected hits of thissampling strategy would be:

${\overset{\sim}{y}}_{D}^{o} \approx {\sum\limits_{x = 1}^{N_{o}}\; {\frac{{x\left( {\overset{\rightarrow}{v}}_{s} \right)}{T\left( {\overset{\rightarrow}{v}}_{s}\rightarrow D \right)}}{a_{o}\left( {{\overset{\rightarrow}{v}}_{s},D} \right)}.}}$

Then, additionally N_(i) input-driven, i.e. source oriented samples{right arrow over (v)}_(N) _(o) ₊₁, . . . , {right arrow over (v)}_(N)_(o) _(+N) _(o) _(i)({right arrow over (v)}) are obtained fromprobability density p_(i)({right arrow over (v)}) which is at leastapproximately proportional to the source emission density, x({rightarrow over (v)}). Thus, the density of samples is d_(i)({right arrowover (v)})=N_(i) p_(i)({right arrow over (v)}). The Monte Carloestimation of the number of hits detected in detector element D usingthis sampling strategy would be:

${\overset{\sim}{y}}_{D}^{i} = {\sum\limits_{x = {N_{o} + 1}}^{N_{o} + N_{i}}\; {\frac{{x\left( {\overset{\rightarrow}{v}}_{s} \right)}{T\left( {\overset{\rightarrow}{v}}_{s}\rightarrow D \right)}}{d_{l}\left( {\overset{\rightarrow}{v}}_{s} \right)}.}}$

Instead of these estimators considering just a single sampling strategy,the use of the mixture of the two sample sets is proposed according toan embodiment of the invention to compute the expected number ofdetector element hits as:

${\overset{\sim}{y}}_{D} = {\sum\limits_{x = 1}^{N_{o} + N_{i}}\; {\frac{{x\left( {\overset{\rightarrow}{v}}_{s} \right)}{T\left( {\overset{\rightarrow}{v}}_{s}\rightarrow D \right)}}{{d_{l}\left( {\overset{\rightarrow}{v}}_{s} \right)} + {d_{o}\left( {{\overset{\rightarrow}{v}}_{s},D} \right)}}.}}$

Individual densities d_(i) or d_(o) do not have to cover the totalintegration domain, it is enough if at least one of the strategiesgenerates a given source point of non-zero emission with non-zeroprobability. Note that in the case of forward projection, detectororiented sampling leads to an output-driven, i.e. gathering typealgorithm and is thus more efficient on parallel hardware, while sourceoriented sampling leads to an input-driven, i.e. scattering typealgorithm that may result in colliding writes. Number of samples N_(i)and N_(o) must be set to consider the requirements that combined densityd_(o)({right arrow over (v)}, D)+d_(i)({right arrow over (v)}) should beroughly proportional to integrand x({right arrow over (v)})T({rightarrow over (v)}→D) and the total computational cost of N_(i) sourceoriented and N_(o) detector oriented samples should be minimal takinginto account the parallel hardware.

As parallel hardware typically favors gathering algorithms, it is worthusing more output-driven samples than input-driven ones. However, ifsome phenomenon in smaller regions is much better sampled by aninput-driven approach, then its density will be higher in this smallerregion than that of the output-driven samples. Thus, the advantages ofinput-driven sampling can be preserved even if their number is small.Input-driven samples can represent well strongly inhomogeneous sourcescontaining small, intensive features, and thus are good inreconstructing them. Output-driven samples, on the other hand, cover thewhole source domain fairly uniformly and are thus good at reconstructinglarger homogeneous features. The combined method, i.e. a reconstructionmethod according to an embodiment of the invention is good in allaspects since the sampling strategy automatically emphasizes a methodwhere it places more samples.

In the following, the application of an embodiment of the inventivemethod is demonstrated for PET reconstruction but it is similarlyfeasible for SPECT reconstruction taking into account that there is nopositron propagation step in a SPECT reconstruction. The overview of aPET reconstruction algorithm is shown in FIG. 3. In FIG. 3, the detailsof an iteration step are shown in one embodiment of the inventivereconstruction method. As it is shown in the figure the transportprocess of the forward projection is decomposed into three factoredsteps in this embodiment.

-   -   1. Positron transport step for describing the propagation of the        positron from a point of origination to a point of annihilation        where the positron annihilates and two oppositely directed gamma        photons are born. This step computes an annihilation density        x^(a)({right arrow over (v)}) from positron emission density        x({right arrow over (v)}), as an exemplary emission density        data.    -   2. Photon transport step that describes photon propagation from        the annihilation point—or point of origination—to a surface of a        detector crystal, i.e. detector element. In this step the        expected number of hits on the surfaces of the detector elements        (LORs), {tilde over (y)}_(L) ^(surf), is computed from the        annihilation density.    -   3. Detector response step (detector evaluation step) that        describes photon propagation from the surface of the detector        element to the output of the detector element. In other words        this step is responsible for all phenomena happening in the        detector elements, i.e. crystals, including inter-crystal        scattering, absorption, and the sensitivity of crystals and        electronics.

The system matrix connecting the measurement data and the emissiondensity data is factorized in this embodiment into sub-matricesaccording to the photon transport step, the detector evaluation step andthe positron transport step. Each of these phases is detailed below, butthe whole iteration cycle is detailed here on the basis of FIG. 3.Starting from the positron emission density data x({right arrow over(v)}) denoted by a dot on the left hand side of the figure, theannihilation density, x^(a)({right arrow over (v)}) is calculated fromx({right arrow over (v)}) by the application of a positron transportstep. Annihilation of the positrons into a pair of photons happens atthis point, so in the next phase the transport of the photon pair is tobe considered. The multiple importance sampling scheme is used in thephoton transport step as it is demonstrated in FIG. 3. The bubblecorresponding to the input-driven transport is encircled by a dashedline. The results coming from the two paths, are synthesized into thevirtual detector element data, {tilde over (y)}_(L) ^(surf). Thesevirtual detector elements are the real detector elements in thisdetailed example, but considering the next phase, i.e. the detectorevaluation step, it should be emphasized that the virtual detectorelement data, {tilde over (y)}_(L) ^(surf), is not the data which haveto be compared to the measurement data of the detector elements. Themeasurement data have to be compared to the virtual detector elementdata obtained after the detector response step, i.e. {tilde over(y)}_(L) ^(det). The virtual detector elements inserted after thedetector response phase are detector elements of the respectiveconceptual detectors, i.e. LORs in the case of PET. It is shown in FIG.3 that multiple importance sampling is taken into account also in thephase of detector response. In back projection, the results obtainedfrom the forward projections, i.e. {tilde over (y)}_(L) ^(det) arecompared to the measurement data of the respective LORs. To summarize,multiple importance sampling is involved in the present embodiment insuch a way that the photon transport step and/or the detectorresponse/evaluation step are of at least partly scattering type. Thescattering methods are introduced in order to make the presentembodiment of the inventive method more suitable for the reconstructionof any kind of measured data. Note, that the usage of only a smallnumber of input-driven threads improves the results of thereconstruction.

The reads and writes indicated by arrows are also distinguished by theirtypes in FIG. 3. Arrows with a solid line correspond to parallelnon-exclusive reads from arrays that are constant during the thread run.This means that each data of the array can be used by several threads,but these threads do not alter the value of the data. Such a read isused e.g. when contributing voxels are collected for a LOR, value of asingle voxel can give contribution for more than one LOR. Arrows with adotted line correspond to the case of non-colliding writes, a thread inthis case writes only its own private data element. This kind of writeis used in the case of output-driven, i.e. gathering type threads whichare preferred for a parallel architecture. In the case where the writesare non-colliding, there is no need to use atomic writes. Atomic writesonly have to be used for threads denoted by a dashed arrow in FIG. 3,i.e. an atomic write has to be used in the case of an input-driventhread. Note that preferably only a small number of input-driven threadsare used compared to the number of the output-driven threads, thereforethe use of atomic writes is not typical in the method, but the use ofinput-driven threads makes it possible that the inventive reconstructionmethod in this embodiment can be used more general for any kind of data.

Note that during the reconstruction, registered CT data can be also usedfor taking into account material dependent positron transport, i.e. thetype of the material comprised in a voxel can be taken into account inthe positron transport phase.

Note that the embodiment introduced in FIG. 3 can be modified in such away that only the output-driven algorithms are used in photon transportstep and/or detector evaluation step. Also the amount of input-drivendata used in the reconstruction can be varied e.g. by the type of theinvestigated material.

In the following some of the above introduced steps of the forwardprojection are detailed. The steps which are not detailed below areperformed in a way known per se. As also mentioned above, forwardprojection simulates the physical phenomena from the emission ofpositrons to the output of detector electronics and executes thefactored steps of positron transport, photon transport (also referred toas geometric forward projection if scattering is computed separately),and detector evaluation that itself consists of detector blurring andsensitivity simulation.

First, in the positron transport step, the followings are taken intoconsideration. The annihilation density x^(a)({right arrow over(v)}_(a)) due to positron range is computed as a blurring of the currentestimate of the positron density with a filter kernel storing theprobabilities of positron offsets between positron generation andannihilation:

${x^{a}\left( {\overset{\rightarrow}{v}}_{a} \right)} = {\int_{V}{{x\left( {\overset{\rightarrow}{v}}_{p} \right)}{P\left( {{\overset{\rightarrow}{v}}_{a} - {\overset{\rightarrow}{v}}_{p}} \right)}\ {v_{p}}}}$

where P({right arrow over (v)}_(a)−{right arrow over (v)}_(p)) is theprobability density that a positron born in {right arrow over (v)}_(p)annihilates in {right arrow over (v)}_(a). This convolution likeexpression is valid only if the material is homogeneous. To extend it tothe inhomogeneous case, a local homogeneous approximation can be appliedand the blurring kernel can be used that is associated with the materialwhere the positron is emitted. This approximation is accurate far frommaterial boundaries but loses its accuracy close to the boundary.Denoting the index of the material at point {right arrow over (v)}_(p)by m({right arrow over (v)}_(p)), expression

${x^{a}\left( {\overset{->}{v}}_{a} \right)} = {\sum\limits_{m}{\int_{v}{{x\left( {\overset{->}{v}}_{p} \right)}{\xi_{m}\left( {\overset{->}{v}}_{v} \right)}{P_{m}\left( {{\overset{->}{v}}_{a} - {\overset{->}{v}}_{p}} \right)}{v_{p}}}}}$

is obtained. In the expression, ξ_(m)({right arrow over (v)}_(p)) is anindicator function that is 1 if in point {right arrow over (v)}_(p)there is material of index m and zero otherwise. As the computationalcomplexity of filtering in spatial domain is proportional to the productof the voxel numbers in the positron density volume and the kernel,spatial filtering gets prohibitively expensive for large kernels. Thefeasible approach is the execution of the filtering in frequency domain,for which efficient 2D and 3D fast Fourier transformation solutionsexist for the GPU (NVIDIA, http://developer.nvidia.com/cuda, in: TheCUDA Homepage.).

Computing the Fourier transformation (denoted by calligraphic F) of bothsides, the expression

${x^{a}\left( \overset{->}{v} \right)} = {\mathcal{F}^{- 1}\left\lbrack {\sum\limits_{m}\; {{\mathcal{F}\left\lbrack {{x(x)}{\xi_{m}\left( \overset{->}{x} \right)}} \right\rbrack} \cdot {\mathcal{F}\left\lbrack {P_{m}\left( \overset{->}{v} \right)} \right\rbrack}}} \right\rbrack}$

is obtained. Note that this computation requires the Fourier transformsof the blurring functions computed during pre-processing for eachmaterial, the Fourier transformation of the positron density once foreach material type (usually two or three), and a single inverse Fouriertransformation.

Instead of using the kernel associated with the material of the positronemission location, the kernel of the material at the position of theannihilation can also be used, which leads to the following convolution:

${x^{a}\left( {\overset{->}{v}}_{a} \right)} = {\sum\limits_{m}\; {{\xi_{m}\left( {\overset{->}{v}}_{a} \right)}{\int_{v}{{x\left( {\overset{->}{v}}_{p} \right)}{P_{m}\left( {{\overset{->}{v}}_{a} - {\overset{->}{v}}_{\mu}} \right)}\ {{v_{p}}.}}}}}$

The convolutions in the sum can also be computed via Fouriertransformations:

${x^{a}\left( \overset{->}{v} \right)} = {\sum\limits_{m}\; {{\xi_{m}\left( \overset{->}{v} \right)}\mathcal{F}^{- 1}{\mathcal{F}}{x\left( \overset{->}{v} \right)}{{{\cdot {\mathcal{F}\left\lbrack {P_{m}\left( \overset{->}{v} \right)} \right\rbrack}}}.}}}$

This second option is more expensive computationally since here thenumbers of both the Fourier transformations and the inverse Fouriertransformations are equal to the number of materials. The accuracy ofthe two techniques depends on whether or not the material including mostof the radioisotopes occupies a larger part of the object.

Herebelow, the description of the photon transport step is detailed. Inthe present embodiment, only a geometric model is considered, wherescattering and random events do not take place.

As the input and output of the step of the photon transport are thevoxel array defining the annihilation density and the expected number ofhits in LOR L, {tilde over (y)}_(L) ^(surf), respectively, a gatheringapproach leads to a parallelization scheme where a thread is responsiblefor computing a single LOR from all relevant voxels.

Considering only the geometry in the present embodiment, i.e. ignoringthe scattering of the propagating particle in the investigated object, adetector element pair can be affected only if its detector elements areseen at opposite directions from annihilation point {right arrow over(v)}. Note, that the photon non-collinearity can be modeled not onlyduring the geometric part of the photon pair transport, but can also beapproximated as a LOR-space blurring (A. Rahmim, J. Tang, M. Lodge. S.Lashkari, M. Ay, R. Lautamaki, B. Tsui, F. Bengel: Analytic systemmatrix resolution modeling in PET: an application to Rb-82 cardiacimaging, Phys Med Biol., Vol. 53, pp. 5947-65 (2008)). The aboveassumption also means that annihilation point {right arrow over (v)}where the photon pair is born and their directions {right arrow over(w)} and {right arrow over (w)} unambiguously identify detector elementhit points {right arrow over (z)}₁ and {right arrow over (z)}₂, oralternatively, from detector element hit points {right arrow over (z)}₁and {right arrow over (z)}₂, those points {right arrow over (v)} anddirection {right arrow over (w)}, which can contribute can bedetermined, as it is shown in FIG. 4. The view point is modifiedherebelow from the annihilation points and directions to detectorelement hit points, and using the correspondence between them, thedetector response is expressed as an integral over the detector elementsurfaces and an integral along lines between the surface points.

The integrals over the detector element surfaces can be estimated byN_(detline) discrete line samples, i.e. point pairs ({right arrow over(z)}₁ ^((s)), {right arrow over (z)}₂ ^((s))) on the surface of detectorelements 13′ and 13″. For the evaluation of the line integral. Siddon'salgorithm (R. L. Siddon: Fast calculation of the exact radiological pathfor a three-dimensional ct array, Medical Physics, Vol. 12. pp. 252-257(1985)) would be a usual choice, but it assumes piece-wise constantfinite-element approximation and may end up different number of loopcycles for different line samples, which degrades multiprocessorperformance. Thus, according to an embodiment of the inventive method,N_(march) equidistant points {right arrow over (l)}_(s,r)—index r runsfrom 1 to N_(march)—are taken along each line segment s ({right arrowover (z)}₁ ^((s)), {right arrow over (z)}₂ ^((s))) and the line integralis preferably evaluated with trapezoidal quadrature. Note that thisscheme is applicable not only for piece-wise constant but for otherbasis-functions as well, its implementation does not need conditionalinstructions, and is very fast if x^(a)({right arrow over (l)}_(s,r)) isread from a 3D texture since tri-linear interpolation is directlysupported by the hardware, and the probability that neighboring threadsneed neighboring voxels is high, which improves cache utilization. Thisapproach is similar to Joseph's method (P. M. Joseph: An improvedalgorithm for reprojecting rays through pixel images, IEEE Transactionson Medical Imaging, Vol. 1, pp. 192-196 (1982)) but may take more thanone sample in a voxel. The extra samples slightly increases the accuracysince tri-linear basis functions lead to cubic integrands in the lineintegral. The extra samples are not worth being eliminated to improveperformance since the SIMD architecture does not prefer different loopcycles in different threads computing different line integrals.

The error of the estimation of the forward projection depends on thenumber of line samples N_(detline) and point samples per line N_(march),and also on the discrepancy of the samples in the five-dimensional spacedefined by the two two-dimensional spaces of the detector elementsurfaces and the one-dimensional space of the line (H. Niederreiter:Random number generation and quasi-Monte Carlo methods, SIAM,Pennsylvania, 1992).

For choosing point pairs ({right arrow over (z)}₁ ^((s)), {right arrowover (z)}₂ ^((s)) on detector element surfaces, Poisson-disk distributedsample points can be used for sampling each of the detector elementsurfaces, respectively. Sampling a surface by independent randomdistribution and by Poisson-disk distribution are both demonstrated inFIGS. 5A and 5B, respectively. It is clear from the figure that thesurface of a detector element can be covered in a more uniform way usingPoisson-disk distribution. Sets of sample points should be preferablycomputed only once and the result can be stored in a constant array.Line integrals between the sample points are approximated preferablytaking equidistant points {right arrow over (l)}_(s,r) with distanceΔl_(s) starting with a random offset. Δl_(s) and {right arrow over(l)}_(s,r) are demonstrated in FIG. 6. With these, the integralestimator is:

${\overset{\sim}{y}}_{L}^{a} = {\sum\limits_{x = 1}^{N_{detline}}\; {\sum\limits_{r = 1}^{N_{detline}}\frac{{x^{a}\left( {\overset{->}{l}}_{s,r} \right)}/\left( {2\; \pi} \right)}{d_{o}\left( {{\overset{->}{l}}_{s,r},{\overset{->}{\omega}}_{s}} \right)}}}$

where x^(a) is the annihilation density and d_(o) is

${d_{o}\left( {\overset{->}{l},\overset{->}{\omega}} \right)} = \frac{N_{detline}{{{\overset{->}{z}}_{1} - {\overset{->}{z}}_{2}}}^{2}}{D_{1}D_{2}\cos \; \theta_{{\overset{->}{z}}_{1}}\cos \; \theta_{{\overset{->}{z}}_{2}}\Delta \; l_{s}}$

where line segment end points {right arrow over (z)}₁, {right arrow over(z)}₂ and ray marching step size Δl_(s) can be unambiguously determinedfrom the line of a place vector {right arrow over (l)}_(s,r) anddirection vector {right arrow over (w)}, and from the arrangement of thedetector elements 13′, 13″ and the volume to be reconstructed. It has tobe noted that in the normal case {tilde over (y)}_(L) ⁰ is equivalent tothe above introduced {tilde over (y)}_(L) ^(surf) but e.g. in the caseof using multiple importance sampling the above expression changesfurther and {tilde over (y)}_(L) ^(surf) is expressed in a morecomplicated way as introduced above. Application of multiple importancesampling to the description of photon transport is detailed herebelow.

In an embodiment of the inventive method, multiple importance samplingis applied to the computation of photon pair transport, i.e. this phaseis of partly scattering type. Consequently input-driven sampling is usedbesides output-driven sampling. In input-driven sampling, first N_(v)volume points {right arrow over (v)}_(s) are obtained with a densitythat is proportional to the activity. Then, line directions are sampledby placing uniformly distributed hit points {right arrow over (z)}₁ ondetector element surface D₁ of the detector element being farther fromthe two modules of coincidence. Volume point {right arrow over (v)}_(s)and surface point {right arrow over (z)}₁ unambiguously define a line ofdirection {right arrow over (w)}_(s), which identifiesdetector elementpair L by the intersected detector element surface on the closerdetector element. Putting just a single sample on each detector elementsurface D₁, the expected number of hits in LOR L is

${\overset{\sim}{y}}_{L}^{i} = {\sum\limits_{x = 1}^{N_{v}}\; \frac{{x^{o}\left( {\overset{->}{v}}_{s} \right)}/\left( {2\; \pi} \right)}{d_{1}\left( {{\overset{->}{v}}_{s},{\overset{->}{\omega}}_{z}} \right)}}$

where density d_(i) is:

${d_{1}\left( {\overset{->}{v},\overset{->}{\omega}} \right)} = \frac{N_{v}{x^{a}\left( \overset{->}{v} \right)}{{\overset{->}{v} - {\overset{->}{z}}_{1}}}^{2}}{{XD}_{1}\cos \; \theta_{{\overset{->}{z}}_{1}}}$

where X is the total activity.

In FIG. 7 the input-driven scheme is demonstrated. A singlecomputational thread of the input-driven photon pair transportcalculation samples points {right arrow over (v)} in proportion to theannihilation density x^(a)({right arrow over (v)}) and processes a linecrossing this point for each LOR as shown in the figure. FIG. 7 shows avoxel 22 which contributes through exemplary lines 20 to many of thedetector elements 12 of the detector devices 10′, 10″.

In an input-driven method, the positron density. i.e. the activity istaken into account during sampling. In an output-driven method, on theother hand, the activity is identified at the end of the computationprocess, thus the activity cannot be taken into account in the densityof sampling. Consequently, an input-driven approach is typically moreaccurate when the activity distribution has a very high variation, forexample, when the activity is concentrated in a small region. However,the output-driven method is better than the input-driven inreconstructing colder—low-activity—regions.

In preferred embodiments involving multiple importance sampling, acombination of output-driven and input-driven schemes is used, whichkeeps the advantages of both. The above introduced output-driven andinput-driven algorithms kept unchanged, only the sample weights aremodified. Then, the estimators of different techniques are simply addedas follows. The combined sample density is

${d\left( {\overset{->}{l},\overset{->}{\omega}} \right)} = {\frac{N_{detline}{{{\overset{->}{z}}_{1} - {\overset{->}{z}}_{2}}}^{2}}{D_{1}D_{2}\cos \; \theta_{z_{1}}\cos \; \theta_{z_{2}}\Delta \; l_{s}} + \frac{N_{v}{x^{a}\left( \overset{->}{v} \right)}{{\overset{->}{v} - {\overset{->}{z}}_{1}}}^{2}}{{XD}_{1}\cos \; \theta_{z_{1}}}}$

With this sample density, the modified output-driven projection andinput-driven projection are

${\overset{\sim}{y}}_{L}^{o} = {\sum\limits_{s = 1}^{N_{detline}}\; {\sum\limits_{c = 1}^{N_{detline}}\; {\frac{{x^{a}\left( {\overset{->}{l}}_{s,r} \right)}/\left( {2\; \pi} \right)}{d\left( {{\overset{->}{l}}_{x,r},{\overset{->}{\omega}}_{s}} \right)}\mspace{14mu} {and}}}}$${{\overset{\sim}{y}}_{L}^{i} = {\sum\limits_{x = 1}^{N_{x}}\; \frac{{x^{a}\left( v_{x} \right)}/\left( {2\; \pi} \right)}{d\left( {{\overset{->}{v}}_{x},{\overset{->}{\omega}}_{s}} \right)}}},$

respectively. The final estimator is the sum of the two estimators:{tilde over (y)}_(L) ^(surf)={tilde over (y)}_(L) ⁰+{tilde over (y)}_(L)^(i).

For the sake of simplicity, in the above embodiment the photon pairtransport is detailed with the assumption that there is neitherscattering nor random events. However, the proposed invention iscompatible with various types of scattering compensation algorithms (M.Magdics, L. Szirmay-Kalos, B. Tóth, Á. Csendesi, and A. Penzov. Scatterestimation for PET reconstruction, in: Proceedings of the 7thinternational conference on Numerical methods and applications, NMA '10,pp. 77-86, Berlin. Heidelberg, Springer-Verlag (2011)), random event,dead-time compensation algorithms and can involve physically accuratedetector elements or crystals, gamma-photon absorption models and can beextended to take binned time-of-flight data into account.

In the following the step corresponding to the response of a detectorelement is detailed Detector elements are parts of planar detectordevices or modules and form a 2D discrete grid. A single detectorelement can be identified by a pair of integer coordinates c=(c_(x),c_(y)) in this grid. Coordinate pairs of detector elements are denotedby bold letters in the expressions below. Photons may get scattered indetector elements before they get finally absorbed. The number ofreported hits due to absorption may be different in different detectorelements due to the variation of crystal sensitivity of the crystal of adetector element and the associated electronics. In a preferredembodiment, in order to reduce the data needed to model detector devicesincluding detector elements, the detector evaluation step comprises adetector blurring step for describing photon propagation from thesurface of a detector element to absorption of the photon, and adetector measurement step for describing an evaluation from theabsorption of the photon to the output of detection, wherein the systemmatrix is factorized into sub-matrices also according to the detectorblurring step and the detector measurement step.

The photon transport is assumed to be translation invariant so it can bemodeled by a single, incident direction dependent blurring kernel. Tohandle crystal sensitivity of the detector elements and electronicsconnected to the detector elements, each detector element c ischaracterized by a detector—or crystal—sensitivity value ε_(c). Thedetector sensitivity value is the expected number of events reported inthis detector element or crystal by the output of the measuring system,provided that a photon has been absorbed here. It is assumed that thisparameter is available for every detector elements, or can be obtainedby indirect measurements.

Scattering inside the detector elements can be modeled by a crystaltransport probability p_(i→c)({right arrow over (w)}) that specifies theconditional probability that a photon is absorbed in detector element cprovided that it arrives at the surface of detector element i fromdirection {right arrow over (w)} as demonstrated in FIG. 8.

It is supposed that the detector elements are small with respect to thedistance of the detector devices, so direction {right arrow over (w)} ofthe LOR is constant for those detector elements which are in theneighborhood of c and where p_(i→c) is not negligible.

The sum of the crystal transport probabilities is the detectionprobability, i.e. the probability that the photon does not get lost, orfrom a different point of view, the photon does not leave the modulewithout absorption:

${v\left( \overset{->}{\omega} \right)} = {\sum\limits_{c}\; {p_{i->c}\left( \overset{->}{\omega} \right)}}$

Let us consider a LOR connecting crystals c₁ and c₂ and having direction{right arrow over (w)}. The expected number of hits in this LOR is:

${\overset{\sim}{y}}_{L{({c_{1},c_{2}})}}^{\det} \approx {ɛ_{c_{1}}ɛ_{c_{2}}{\sum\limits_{i}\; {\sum\limits_{j}{{\overset{\sim}{y}}_{L{({i,j})}}^{surf}{p_{i->c_{1}}\left( {\overset{->}{\omega}}_{ij} \right)}{p_{j->c_{2}}\left( {\overset{->}{\omega}}_{ij} \right)}}}}}$

It is clear from the expression that the detector response calculationis a convolution operation in four-dimensional LOR space. The space ofthe LORs is four-dimensional since a LOR is determined by its two endpoints, each given by a coordinate pair on the detector surfaces.

The detector elements of an exemplary PET system are LYSO crystals, theaverage free path of gamma-photons of 511 keV is about 5 mm, whiledetector elements are packed at about 1.12 mm distance, thus withnon-negligible probability, photons of larger incident angles can beabsorbed farther than the tenth neighbor of the detector element wherethe photon entered the detector device. Thus, the convolution of theabove expression should cover at least 20×20 detector elements on bothends of the LOR, which makes the number of terms to be summed greaterthan 20⁴. The prohibitive cost of such large LOR convolution cannot bereduced by filtering in the frequency domain since the expression alsodepends on the incident directions, and therefore spatially variant inLOR space. Instead, to reduce the computation time, the long sum isevaluated by Monte Carlo estimation taking N_(o) random offset samples,where a sample is associated with a pair of offset vectors t₁=c₁−i andt₂=c₂−j. The density of samples is

${d_{o}\left( {i,j,c_{1},c_{2}} \right)} = {\frac{N_{o}{p_{i->c_{1}}\left( {\overset{->}{\omega}}_{ij} \right)}{p_{j->c_{2}}\left( {\overset{->}{\omega}}_{ij} \right)}}{{v_{1}\left( {\overset{->}{\omega}}_{ij} \right)}{v_{2}\left( {\overset{->}{\omega}}_{ij} \right)}}.}$

Relaxed sample sets are preferably pre-generated sample sets containingN_(o) samples by minimizing the difference between their distributionand the real distribution, which is measured or calculated with atransport Monte Carlo simulation off-line. These pre-generated samplesare preferably used in the detector blurring step. Thanks to thetranslation invariance, it is sufficient to carry out the measurement orthe off-line simulation and sample generation only once and use the sameset of samples in each LOR.

Multiple importance sampling can also be adapted to the above detailedstep as follows. In this case, in addition to the samples discussedabove also input-driven contributions are taken into account: LORsL(i(N_(o)+1), j(N_(o)+1)), . . . , L(i(N_(o)+N_(i)), j(N_(o)+N_(i))) aresampled proportionally to {tilde over (y)}_(L) ^(surf), where LOR L(i,j) is selected with density

${d_{i}\left( {i,j} \right)} = {N_{i}{\frac{{\overset{\sim}{y}}_{L{({i,j})}}^{surf}}{Y}.}}$

where N_(i) is the number of input-driven LOR samples and Y=Σ_(ij){tildeover (y)}_(L(t,j)) ^(surf).

Using both input-driven and output-driven samples, i.e. apply multipleimportance sampling for the detector response phase, the estimator ofthe detector response is:

${\overset{\sim}{y}}_{L{({c_{1},c_{2}})}}^{\det} \approx {ɛ_{c_{1}}ɛ_{c_{2}}{\sum\limits_{x = 1}^{N_{r} + N_{v}}\; {\frac{{\overset{\sim}{y}}_{L{({{i{(s)}},{j{(s)}}})}}^{surf}{P_{{i{(x)}}->c_{1}}\left( {\overset{->}{\omega}}_{ij} \right)}{p_{{j{(s)}}->c_{2}}\left( {\overset{->}{\omega}}_{ij} \right)}}{{d_{i}\left( {{i(s)},{j(s)}} \right)} + {d_{ij}\left( {{i(s)},{j(s)},c_{1},c_{2}} \right)}}.}}}$

Herebelow, the details of a back projection are given as used in somepreferred embodiments of the invention. The back projection considersthe ratios of measured and estimated LOR values and executes the stepsof forward projection backwards in reverse order to update the voxelestimates. As the detector response phase computes LOR values from LORvalues, and the positron range phase voxel values from voxel values,their backwards execution is relatively simple. The reversed detectorresponse operation multiplies the ratios of measured and estimated LORvalues first with the detector element sensitivities and then blurs inLOR space with the detector blurring kernel.

The critical step of back projection is the geometric step, i.e. thephoton transport step, since it computes LORs from voxels in forwardprojection, and thus voxels from LORs in back projection. Asparallelization prefers output-driven algorithms, the geometric backprojection must be different from the forward projection. Furthermore,the geometric back projection is essential in the iterativereconstruction algorithm but positron range and even the detectorresponse may be skipped in this phase, making forward and backprojections unmatching with respect to each other. The application ofunmatched back projector can reduce the time needed by a singleiteration and can improve convergence, but skipping the detectorresponse makes the system less stable and accurate for low-dose andtherefore low-statistics measurements.

The positron transport step and/or the detector response step may bealso omitted or other simplified steps can be applied instead of theabove detailed positron transport step and detector response step in theforward projection. The elimination of these steps result only in thatthe transport of the positrons and the photon transport in the detectorsare not taken into account but this does not alter the type of the inputand the output of the forward and back projections, namely, while boththe input and output are voxel values in the positron transport step andboth the input and output are detector element data in the detectorresponse step.

In the following, voxel-driven, i.e. output driven geometric backprojection will be discussed. In order to handle the sensitivity of EMreconstruction scheme to inconsistent noises, some regularization schemeis worth to apply, e.g. total variation regularization causing minormodification of voxel updates for improving the emission density data.

Geometric back projection computes the scaling factors of voxels fromthe ratios of measured and computed LOR values as described by the backprojection equation introduced above, thus a gathering approach leads toa parallelization scheme where a thread is responsible for computing theupdate ratio of a single voxel from all relevant LORs.

In a thread corresponding to back projection, the elements in the columnof the system matrix associated with the current voxel are estimated bytaking discrete position samples inside the voxel. These sample pointsare generated using a basis function as the density duringpre-processing and are used for each voxel. Each detector element pairis considered and the surface of detector elements on the fartherdetector device is projected onto the plane of the other detectorelement via position samples. Note that with a fixed position sample,central projection is a homogeneous linear function whose parametersneed to be computed only once for every detector element pair. Centralprojection maps the rectangle onto another rectangle if modules areparallel and to a trapezoid if only the axial detector element edges areparallel but the tangential edges are not. Those other detector elementsurfaces are found to fall inside the projection, and thus form LORs forthis voxel. The intersection of the surface areas of projections anddetector element surfaces is used to estimate the solid angle in whichthe LOR can be seen from the sampling point. The above detailedsituation is demonstrated in FIG. 9. The figure shows a singlecomputational thread of the geometric back projection corresponding toLOR L, which takes point samples {right arrow over (v)} and projects thesurface of a detector element 13″ of the farther detector device 10″onto the plane of closer detector device 10′ to obtain the solid anglesof LORs. The figure shows a single point sample in {right arrow over(v)} and a pair of parallel detector devices 10′, 10″, but the threadprocesses all detector element pairs and all point samples associatedwith this voxel. If detector devices are not parallel, then the centralprojection of a rectangle is a trapezoid. The system matrix element canbe computed from the solid angle, the length of the line segment insidethe voxel, and the total attenuation of the line between two detectorelements. The attenuation is primarily caused by Compton scattering. Theresult of the attenuation is that some of the particles scatter out froma specific LOR. The attenuation can be preferably taken into account insome embodiments of the reconstruction method according to theinvention.

Note that by starting a contributing LOR for a voxel from the fartherdetector element, the projection of the farther detector element cancover several neighboring detector elements partly on the closerdetector device. In this case the contribution of the farther detectordevice can be estimated by the contribution of every contributingdetector elements by weighting.

The expectation maximization algorithm—used in the iteration steps inpreferred embodiments of the method according to the invention—is knownto be ill-conditioned, which means that enforcing the maximization of alikelihood function may result in a solution with drastic oscillationsand noisy behavior. This problem can be attacked by regularizationmethods that include additional information (P. J. Green: Bayesianreconstructions from emission tomography data using a modified EMalgorithm, IEEE Trans. Med. Im., Vol. 9, pp. 84-93 (1990)), e.g. as apenalty term. An appropriate penalty term is the total variation (TV) ofthe solution (M. Persson, D. Bone, H. Elmqvist: Total variation norm forthree-dimensional iterative reconstruction in limited view angletomography, Physics in Medicine and Biology, Vol. 46, pp. 853-866(2001)).

The accuracy of the system matrix elements used in forward projectionprimarily affects the accuracy of the reconstruction. System matrixelements—at least some of matrix elements of the sub-matrices—arepreferably computed on-the-fly by a numerical quadrature, for whichdiscrete random points are used as detailed above.

The method according to the invention can preferably use two types oferror reduction techniques, stochastic iteration (L. Szirmay-Kalos:Stochastic iteration for non-diffuse global illumination. ComputerGraphics Forum, Vol 18, pp. 233-244 (1999)), which is a filteringexecuted in the time-domain, and filtered sampling (J. Krivánek, M.Colbert: Real-time shading with filtered importance sampling. ComputerGraphics Forum, Vol. 27, pp. 1147-1154 (2008)), which is a filteringexecuted in the spatial domain. Both filtering schemes improve theaccuracy without essentially increasing the computation time.

Stochastic iteration uses different samples in different iteration stepsmaking sure that the expectation of the error is zero, which results incomputed LOR responses {tilde over (y)}_(L) that fluctuate around theirexpected value, and averages recent forward projection estimates duringthe iteration sequence. Concerning accuracy, stochastic iteration issimilar to iterating with the average of the system matrices, i.e. withthe matrix that would be computed using more samples.

Turning to the other above mentioned scheme: filtered sampling replacesthe integrand of forward projection by another function that has asimilar integral but a smaller variation, therefore, its integral can beestimated more precisely. The control loops of classical iteration andwith the application of filtered sampling are shown in FIG. 10 and FIG.11, respectively. The figure shows that forward projection F takes thefiltered voxels {circumflex over (x)}_(Y) computed as a filtering G ofthe output of the back projection B. If forward projection were ideal,then this loop would stabilize when B(F({circumflex over (x)}))=1, thusfiltered voxel value {circumflex over (x)} would be the ideal solutiondespite the included filtering. On the other hand, when forwardprojection is not ideal and the variation of its input poses problems,the variation of the input can be reduced with the added filter.

The performance of the different embodiments of the method according tothe invention is demonstrated by showing some exemplary results. Thepresented examples are produced with the CUDA Implementation that runson NVidia GeForce GTX 480 GPUs. The simulated or measured data have beenobtained assuming a known nanoPET/CT system (NanoPET/CT in-vivopreclinical imager, 2010,http://www.bioscan.com/molecular-imaging/nanopet-ct) that consists oftwelve square detector devices or modules consisting of 81×39 crystalsdetectors, i.e. detector elements. The exemplary detector elements areof surface size 1.12×1.12 mm². During reconstruction, registered CT datacan also be relied on, which can be used in object and materialdependent positron range and attenuation/scattering computations.

Three different mathematical phantoms are simulated by GATE(http://www.opengatecollaboration.org/) modeling the nanoPET system inorder to demonstrate the performance of an embodiment of the inventivemethod in geometric projection: an off-axis point source phantom of 0.1MBq activity and measured for 10 sec, a Derenzo phantom with roddiameters 1.0, 1.1, . . . , 1.5 mm in different segments, filled up with1.6 MBq activity and simulated for 100 sec, and a homogeneity phantomthat is built of 8 constant activity cubes having 0.6 MBq activity intotal and simulated for 20 sec. The cross sections of the Derenzo andhomogeneity phantoms are demonstrated in FIGS. 12A and 12B,respectively. The mathematical phantoms also make it possible to testand evaluate the factored steps separately. In the GATE modeling,physical effects are turned off and on to produce ideal geometricmeasurement and data simulating physical effects as well. In idealgeometric simulations, the LYSO crystals—i.e. the detector elements—ofnanoPET/CT are replaced in the GATE by a very dense material,back-to-back gamma sources are placed in the voxels, and the phantomattenuation and scattering are turned off. In physically plausiblesimulations, i.e. when physical effects are taken into account, thedetector model is made realistic, i.e. LYSO crystals are specified forGATE, isotope sources are defined, and attenuation and scattering arealso allowed to happen. From these physical phenomena, positron rangeand detector response are critical in small animal PET, attenuation hasa moderate effect and scattering just a minor one.

The off-axis point source is expected to favor the input-drivensampling, the homogeneity phantom the output-driven sampling, while theDerenzo is a typical case in between the two previous extreme cases.

In FIGS. 13A-13C the Cross Correlation (CC) error of the reconstructionis shown as the function of the computation time for the three aboveintroduced phantoms, respectively. The definition of CC is given athttp://mathworld.wolfram.com/CorrelationCoefficient.html and denoted byr. The number r is the cross-correlation coefficient of a currentlyobtained reconstruction and the original phantom. The depicted quantityis the CC Error with the definition: CC Error=100*(1−|r|). A zero valueof the CC Error would correspond to the exact match between them (r1=),100% would correspond to a random guess which is not correlated with theexact result (r=0).

In FIGS. 13A-13C the performances of fully output-driven and fullyinput-driven embodiments are compared to an embodiment combining theoutput-driven and input-driven schemes (Multiple Importance Sampling).In FIGS. 13A-13C the data corresponding to output-driven, input-drivenand combined schemes are denoted by continuous, long-dashed andshort-dashed lines, respectively. For the different phantoms, differentparameters are used in the different embodiments of the reconstructions.For the point phantom depicted in FIG. 13A, N_(detline)=20, N_(march)=99for the fully output-driven scheme, N_(v)=10⁵ for the fully input-drivenscheme, and N_(detline)=4. N_(march)=25 and N_(v)=10⁴ for the combinedscheme. For the Derenzo phantom depicted in FIG. 13B, N_(detline)=6,N_(march)=72 for the fully output-driven scheme, N_(v)=10⁶ for the fullyinput-driven scheme, and N_(detline)=4, N_(march)=36 and N_(v)=10⁴ forthe combined scheme. For the homogeneity phantom depicted in FIG. 13C,N_(detline)=2, N_(march)=54 for the fully output-driven scheme,N_(v)=106 for the fully input-driven scheme, and N_(detline)=1,N_(march)=36 and N_(v)=10⁵ for the combined scheme. Note that for thefully output-driven algorithms N_(v)=0 and for fully input-drivenalgorithms N_(detline)=0 and N_(march)=0 always hold.

The following general consequences can be drawn based on thecalculations of the various embodiments. As expected, the fullyinput-driven approach can perform better than the only output-drivenscheme only for the point phantom, while the fully output-drivenapproach is better than the input-driven one for the homogeneity phantomand the Derenzo phantom. However, for all three phantoms, both of theabove embodiments, i.e. the fully input-driven, as well as the fullyoutput-driven, can be outperformed by the above proposed combinedmethod, i.e. the embodiment of the inventive method which preferablyapplies multiple importance sampling. By the application of the combinedmethod, the number of line samples N_(detline) and the number of stepsN_(march) can be reduced, and relatively few N_(v) volume points shouldbe added to compensate the missing samples at important regions. Notethat for about 10⁶ voxels, 10⁴ added volume samples seem satisfactory.The reason is that at high activity, point like features, a few samplescan result in accurate projections, while at larger homogeneous regions,the output-driven method already does a fairly good job, which is notcompromised by the added rare volume samples. The random selection andprojection of 10⁴ volume samples onto 180 million LORs needs just 0.3seconds on the GPU, which is negligible with respect to the times ofother processing steps.

FIGS. 14A and 14B demonstrate the reconstruction result of a physicalmeasurement taken by a known nanoPET/CT system. Here the targetresolution is 324×315×315 voxels. The difference of the reconstructionmethods performed for obtaining FIGS. 14A and 14B is that the resultshown in FIG. 14A is obtained by an embodiment of the inventioncomprising the above detailed photon transport step but not comprisingthe above detailed detector response step, while both that photontransport and detector response steps are utilized for thereconstruction result shown in FIG. 14B. N_(detline)=4 Poisson-diskdistributed line samples per LOR are used, N_(march)=162 point samplesper line in LOR-driven forward projection and N_(v)=10⁵ volume pointsamples in voxel-driven forward projection, which are combined withmultiple importance sampling for both of FIGS. 14A and 14B. Forreconstructing the data shown by 14B, the detector response was computedwith 64 output-driven samples in forward projection. In the exampleshown in FIG. 14B, the forward and back projections are unmatching sincein the detector response step of the back projection a simplifieddetector model is used which is proposed in M. Magdics, B. Tóth, L.Szécsí, B. Csébfalvi, L. Szirmay-Kalos, Á. Szlavecz, G. Hesz, B. Benyó,Á. Cserkaszky, D. Légrády, Sz. Czifrus, Á. Wirth, B. Kári, J. Lantos, G.Patay, D. Völgyes, P. Major, G. Németh. T. Bükki, and B. Domonkos:Detector modeling techniques for pre-clinical 3D PET reconstruction onthe GPU, in: The 11th International Meeting on Fully Three-DimensionalImage Reconstruction in Radiology and Nuclear Medicine, pp. 375-8(2011). In the reconstruction of FIG. 14A the above simplified detectormodel is used both in the forward and back projections. Gaussianfiltering is applied to the reconstructed volume between iteration stepswith standard deviation equal to the double of the linear size of avoxel. Stochastic iteration was also turned on after the fifth iterationstep, i.e. took statistically independent sets of random samples andcomputed the average of expected LOR hits. Total variationregularization is applied during the iteration. The positron transportstep is not carried out in the reconstruction of the FIGS. 14A and 14B.

TABLE 1 EM OSEM, 6 subsets Factored step 128³ 256³ 128³ 256³ Positronrange with 3 0.6 sec   2 sec 0.6 sec    2 sec materials Geometric FP (4line  7 sec 58 sec 1 sec  9 sec samples) Geometric BP (1 point sample 12sec 90 sec 2 sec 15 sec per voxel) LOR filtering (64 samples)  9 sec  9sec 1.4 sec   1.4 sec  Total 29 sec 159 sec  5 sec 27 sec

Table 1 presents the execution times of the factored steps needed toexecute a single EM (Expectation Maximization) iteration and to processa subset in OSEM (Ordered Subset Expectation Maximization), whenreconstruction is performed at 128³ and 256³ voxel resolutions,respectively. The illustrated reconstruction times correspond to anunmatched back projection scheme, i.e. detector response simulation andpositron range simulation are not taken into account in the course ofthe back projection. Note that the geometric back projection takeslonger than forward projection and dominates the computation time, whichis due to the fact that the voxel driven, i.e. output-driven, backprojection visits all relevant LORs for each voxel in a thread, but theLOR data structure is essentially a four-dimensional array (an arrayelement is identified by the four indices of the four spatialcoordinates determines a certain LOR) which cannot be as efficientlyplaced in textures as 3D voxels. Unlike geometric back projection,forward projection is LOR driven in connection with the data shown inthe table, i.e. similarly output-driven as the back projection. In a LORdriven forward projection all relevant voxels are accessed for each LORin a thread, and the 3D texture cache can be highly utilized.

If further speed-ups are needed, more than one GPU card can be used. Ina multi-GPU solution, a single GPU card maintains just a portion of thevoxel array and computes the geometric forward and back projections asif only these voxels existed. Between the forward and back projections,LOR images are exchanged and every GPU adds the LOR values of othercards to its own result. This multi-GPU parallelization linearly speedsup geometric forward and back projections and its overhead is only thetime needed for the exchange and the addition of the computed LORvalues.

Scattering and random effects can be handled additively as described inM. Magdics, L. Szirmay-Kalos, B. Tóth, A. Csendesi, and A. Penzov:Scatter estimation for PET reconstruction, in: Proceedings of the 7thinternational conference on Numerical methods and applications, NMA '10,Springer-Verlag, Berlin, Heidelberg, pp. 77-86 (2011). The simulatedfactors are addressed one by one, and for each of them, a GPU optimizedsolution is found from the alternatives of on-the-fly computation, MonteCarlo estimation, and the usage of pre-computed, measured or simulateddata.

The invention is, of course, not limited to the preferred embodimentsdescribed in details above, but further versions, modifications anddevelopments are possible within the scope of protection defined by theclaims.

1. A method for tomographic reconstruction from measurement data, themethod comprising the steps of: obtaining the measurement datacorresponding to activity of particle emissions of a volume from atomographic imaging system, and performing at least one iteration stepto obtain emission density data of particle emissions of the volume fromthe measurement data, wherein the at least one iteration step comprisesa forward projection being a projection from the emission density datato the measurement data, and a back projection being to a projectionfrom the measurement data to the emission density data, and the at leastone iteration step is carried out in a parallel hardware architecture,characterized in that both the forward projection and the backprojection are of at least partially gathering type, and the measurementdata is of binned type.
 2. The method according to claim 1,characterized in that the forward projection comprises a photontransport step for describing photon propagation from a point oforigination of the photon to a surface of a detector element, and adetector evaluation step for describing photon propagation from thesurface of the detector element to an output of detection, wherein themeasurement data is obtained from a signal of a detector devicecomprising a plurality of the detector elements, the measurement dataand the emission density data are connected by a system matrix, whereinthe system matrix is factorized into sub-matrices according to thephoton transport step and the detector evaluation step.
 3. The methodaccording to claim 2, characterized in that the photon transport stepand/or the detector evaluation step are of at least partly scatteringtype.
 4. The method according to claim 2 or 3, characterized in that inthe photon transport step the surfaces of the detector elements aresampled by a Poisson-disk distribution.
 5. The method according to anyof claims 2 to 4, characterized in that the detector evaluation stepcomprises a detector blurring step for describing photon propagationfrom the surface a detector element to absorption of the photon, and adetector measurement step for describing an evaluation from theabsorption of the photon to the output of detection, wherein the systemmatrix is factorized into sub-matrices also according to the detectorblurring step and the detector measurement step.
 6. The method accordingto claim 5, characterized in that pre-generated samples are used in thedetector blurring step, the pre-generated samples being calculated witha Monte Carlo simulation.
 7. The method according to any of claims 2 to6, characterized in that each detector element is characterized by adetector sensitivity value for handling crystal sensitivity of thedetector elements and electronics connected to the detector elements. 8.The method according to any of claims 2 to 7, characterized in that themethod is for positron emission tomography, and the forward projectioncomprises a positron transport step for describing propagation of apositron from a point of origination to a point of annihilation into apair of photons, and the system matrix is factorized into sub-matricesalso according to the positron transport step.
 9. The method accordingto any of claims 2 to 8, characterized in that at least some of matrixelements of the sub-matrices of the system matrix are computedon-the-fly.
 10. The method according to any of claims 1 to 9,characterized in that the forward projection and the back projection areunmatching with respect to each other.
 11. The method according to anyof the claims 1 to 10, characterized in that the parallel hardwarearchitecture is at least one graphic processing unit comprising aplurality of scalar processors.
 12. The method according to any ofclaims 2 to 11, characterized in that the parallel architecturecomprises parallel threads and at least a part of the threads hasidentical control path.
 13. The method according to claim 12,characterized in that in at least a part of the steps algorithmscorresponding to the step are carried out in parallel threads being ofidentical length.
 14. The method according to any of claims 1 to 13,characterized in that in the forward projection a filtering is executedin time-domain and/or spatial domain.
 15. The method according to any ofclaims 1 to 14, characterized in that a regularization scheme is appliedin back projection for improving the emission density data.
 16. Acomputer readable medium comprising a computer readable program forprocessing tomographic reconstruction from measurement data, wherein thecomputer readable program when executed on a computer causes thecomputer to perform the steps of the method according to any of claims 1to
 15. 17. A system for processing tomographic reconstruction frommeasurement data, characterized by comprising an obtaining unit forobtaining the measurement data corresponding to activity of particleemissions of a volume from a tomographic imaging system; and aniteration unit for performing at least one iteration step to obtainemission density data of particle emissions of the volume from themeasurement data wherein the iteration unit comprises a forwardprojection unit for performing a forward projection for projecting theemission density data to the measurement data; and a back projectionunit for performing a back projection for projecting the measurementdata to the emission density data; and wherein the iteration unitcomprises a parallel hardware architecture, the forward projection andthe back projection are of at least partially gathering type, and themeasurement data is of binned type.